Hybrid Deep-Learning  Action Prediction Architecture

ABSTRACT

A hybrid deep-learning action prediction architecture system is described that predicts actions. The architecture includes a main path and an auxiliary path. The main path may contain multiple layers of convolutional neural networks for further aggregation to coarser time spans. The resultant data produced by the convolutional neural networks is passed to multiple layers of LSTMs. The outputs from LSTMs are then combined with the profile in the auxiliary path to predict an action label.

BACKGROUND

Digital analytics systems are implemented to analyze “big data” (e.g.,Petabytes of data) to gain insights that are not possible to obtain,solely, by human users. In one such example, digital analytics systemsare configured to analyze big data to predict occurrence of futureactions, which may support a wide variety of functionality. Predictionof future action, for instance, may be used to determine when a machinefailure is likely to occur, improve operational efficiency of devices toaddress occurrences of events (e.g., to address spikes in resourceusage), resource allocation, and so forth.

In other examples, this may be used to predict user actions. Accurateprediction of user actions may be used to manage provision of digitalcontent and resource allocation by service provider systems and thusimprove operation of devices and systems that leverage thesepredictions. Examples of techniques that leverage prediction of userinteractions include recommendation systems, digital marketing systems(e.g., to cause conversion of a good or service), systems that rely on auser propensity to purchase or cancel a contract relating to asubscription, likelihood of downloading an application, signing up foran email, and so forth. Thus, prediction of future actions may be usedby a wide variety of service provider systems for personalization,customer relation/success management (CRM/CSM), and so forth for avariety of different entities, e.g., devices and/or users.

Techniques used by conventional digital analytics systems to predictoccurrence of future actions, however, are faced with numerouschallenges that limit accuracy of the predictions as well as involveinefficient use of computational resources. One challenge serviceprovider systems face is customer churn, i.e., loss of customers. Inoperation, the service provider system may take measures to mitigatecustomer churn, which are called customer retention measures. Customerretention measures implemented by the service provider systems primarilyinvolve targeting customers at a high churn risk with a churn predictionmodel. A churn prediction model is then used by the digital analyticssystem to determine proactive measures to engage with customers toreduce a risk of churn.

Conventional techniques involving a churn prediction model used topredict user actions formulate the problem as binary classification,e.g., by trying to predict whether the action has or has not occurred.This technique, as implemented by conventional digital analytics systemsuses a feature set for modeling user behavior that includes user profilefeatures and behavior features. User profile features typically includecharacteristics and properties of users. The behavior features includeproperties and characteristics of behaviors that a user may exhibit.Behavior features, in conventional digital analytics systems, aretypically hand-crafted or manually developed. And, while suchconventional formulations can, in some instances, be effective to somedegree, there are drawbacks and challenges that cause inaccuracy in theprediction and use of computational resources.

In one such example, a technical challenge faced by conventional digitalanalytics systems involves how to obtain an optimal feature set based onhandcrafted features and how best to automate feature generation. Thatis, handcrafted features can fail to take into account the technicalcomplexity of the landscape and can thus result in a less than desirablefeature set (i.e., is not “optimal”) due to the limited knowledge of auser that manually inputs the handcrafted features. Although conventiontechniques have been developed to automate feature generation, theseconventional techniques are generally slow to train (and thus do notsupport real time operation) and fail to achieve desirable resultsflowing from an inability to preserve an adequate amount of information.

Another technical challenge involves how best to increase datautilization by taking multiple historical outcomes for every customer.That is, the “binary classification” approach of conventional methodsdoes not utilize data at a level of granularity in a manner thatsupports robust and accurate prediction outcomes for every customer. Asa result of these challenges, conventional digital analytics systemsfail to accurately predict actions and involve inefficient use ofcomputational resources.

SUMMARY

To address the above-identified challenges, a deep learning architectureis utilized by a digital analytics system for action prediction, e.g.,user or machine actions. The deep learning architecture implements amodel that dramatically outperforms conventional models and providesuseful insights into those actions, thereby increasing accuracy of thepredictions and operational efficiency of computing devices thatimplement the model.

In one or more implementations, a hybrid deep-learning based, multi-patharchitecture is employed by a digital analytics system for actionprediction. In one example, the architecture includes main and auxiliarypaths. The main path includes one or more convolutional neural networks(ConvNets or CNN), long-short-term-memory (LSTM) neural networks andtime distributed dense networks. These networks collectively processusage data and, from the auxiliary path, profile data, to produce anoutput in the form of a “label” which represents a predicted action thatis predicted to happen in a next fixed time window at the end of a LSTMsummary time span.

This Summary introduces a selection of concepts in a simplified formthat are further described below in the Detailed Description. As such,this Summary is not intended to identify essential features of theclaimed subject matter, nor is it intended to be used as an aid indetermining the scope of the claimed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

The detailed description is described with reference to the accompanyingfigures. Entities represented in the figures may be indicative of one ormore entities and thus reference may be made interchangeably to singleor plural forms of the entities in the discussion.

FIG. 1 is an illustration of a digital medium environment in an exampleimplementation that is operable to train and use a hybrid deep learningarchitecture described herein.

FIG. 2 is an illustration of a specific implementation of a hybrid deeplearning architecture in accordance with one or more implementations.

FIG. 3 is a flow diagram that describes operations in accordance withone or implementations.

FIG. 4 illustrates an example specific architectural arrangement of thearchitecture of FIG. 2 in accordance with one implementation.

FIG. 5 illustrates charts that present performance comparisons betweenthe innovative hybrid deep learning architecture and other baselineapproaches.

FIG. 6 illustrates charts that present performance comparisons betweenthe innovative hybrid deep learning architecture and a currentproduction model.

FIG. 7 illustrates an example system including various components of anexample device that can be implemented as any type of computing deviceas described and/or utilize with reference to FIGS. 1 and 2 to implementembodiments of the techniques described herein.

DETAILED DESCRIPTION

Overview

Prediction of occurrence of future actions may be used to support a widerange of functionality by service provider systems as described above,examples of which include device management, control of digital contentto users, and so forth. Conventional techniques and systems to do so,however, have limited accuracy due to the numerous challenges faced bythese systems, including inaccuracies of handcrafted features and how toobtain an optimal feature set. Accordingly, service provider systemsthat employ these conventional techniques are confronted withinefficient use of computational resource to address these inaccuracies.For example, accuracy in prediction of events involving computationalresource usage by a service provide system may result in outages ininstances in which a spike in usage is not accurately predicted or overallocation of resources in instances in which a spike in usage ispredicted but does not actually occur. Similar inefficiencies may beexperienced in systems that relay on predicting events involving useractions, e.g., churn, upselling, conversion, and so forth.

Accordingly, a hybrid deep learning architecture system is describedthat overcomes the challenges of conventional systems to take proactivemeasures to optimize resource allocations. This includes supporting anability of the hybrid deep learning architecture system for automaticfeature generation such that handcrafted features are no longerrequired. Additionally, the hybrid deep learning feature architecturesystem supports inclusion of profile features through use of anauxiliary path that describes characteristics of an entity (e.g., useror device) that is associated with the action, which improvesperformance of the model in generating a prediction of the action.

In one example, the hybrid deep learning architecture includes a mainpath and the auxiliary path described above. The main path isimplemented using modules of the hybrid deep learning architecturesystem to process input data including activity logs that describeactivities and the like. User activities as reflected in activity logscan include, by way of example and not limitation, daily product usagesummaries such as the daily application launch counts, daily totalsession time of all launches for each application and the like. Theauxiliary path is also implemented using modules of the hybrid deeplearning system to process profiles, which may include static profilefeatures and dynamic profile features. Static profile features may referto characteristics such as gender, geographical location, marketsegments, and the like that are time invariant. Dynamic profile featuresmay refer to such things as software subscription age and the like thatchange over time. A connection architecture is then employed by thehybrid deep learning architecture system between the main and auxiliarypaths. This enables the main path of the hybrid deep learningarchitecture system to consider both the static profile features anddynamic profile features to generate a prediction of an action, e.g., auser action, with increased accuracy. This is not possible usingconventional systems and facilitates data utilization to providemultiple historical outcomes for each single user as further describedbelow.

Furthermore, challenges posed with respect to how to deal with biaseddata sampling due to label definition are addressed by thisarchitecture. The dual path architecture reduces biased data sampling,at least in part, by utilizing a convolutional neural network system tosummarize aggregated user input, such as activity logs, and processingthe summarized aggregated user input using a long short term memory(LSTM) neural network system. The long short term memory neural networksystem of the hybrid deep learning architecture system facilitatesclassification, processing, and predicting time series given time lagsof unknown size and duration between events. A time distributed densenetwork system is then used to process the data produced by the longshort term memory neural network, as well as static and dynamic profiledata from the auxiliary path to provide more robust and accurate labelswhich constitute predicted user intended actions that are predicted tohappen in a next fixed time window at the end of a LSTM summary timespan.

In an implementation example, modules of the main path include one ormore convolutional neural networks (ConvNets or CNN),long-short-term-memory (LSTM) neural networks and time distributed densenetworks that collectively process user input usage data. The modulesare also configured to process, from the auxiliary path, user profiledata to produce an output in the form of a “label” which represents datadescribing a predicted action, e.g., “what is predicted to happen next”in a fixed time window.

In operation, the hybrid, deep-learning architecture system predictsactions using a unique model architecture having a main path and anauxiliary path. The main path contains multiple layers of ConvNets forfurther aggregation of blocks of usage summary vectors over time spans.The usage summary vectors are based on input data that describes actionsover a time span having a first granularity. Aggregation of the blocksof usage summary vectors produces resultant data that summarizes theuser actions over a time span that has a second granularity that iscoarser than the first granularity. Aggregation of the blocks reducesnoise and reduces training data size and thus improves efficiency inboth training and use of the neural networks to generate predictions.

This resultant data is passed to multiple layers of Long Short TermMemory (LSTM) neural networks which determine long range interactions bycapturing the long range interactions from the resultant data passedfrom the ConvNets. The prediction is then generated using multiplelayers of a time distributed fully connected dense neural network basedon the determined long range interactions with profile data suppliedfrom the auxiliary path. The profile data, for instance, may describestatic characteristics of an entity that corresponds to the action thatdo not change over time (e.g., market segments, gender) or dynamiccharacteristics of the entity that correspond to a particular timeand/or do change over time (e.g., subscription age). As a result,accuracy of the prediction using the main path may be improved usingprofile data of the auxiliary path as further described below withinthis hybrid architecture.

In this way, the hybrid deep-learning architecture system for actionprediction has several advantages over the traditional predictivemodels. Specifically, the innovative architecture is capable ofautomatic feature generation without the need for handcrafted features.Thus, the process is highly efficient, automatic, and easily scalable.The architecture also provides multiple outputs for one user at manyrecurrent layers, e.g., of LSTMs, for increased data utilization.

The machine-learning architecture described herein also has advantagesover an LSTM-alone architecture. Specifically, the introduction of anauxiliary path enables inclusion of profile features which, in turn,improves model performance. The introduction of CNN into the hybrid deeplearning architecture system transforms original summary time steps tocoarser granularities which, in turn, reduces both noise and trainingtime. Since CNNs can have a complex structure and the weights arelearned through training, this way of aggregation is more automatic andcan preserve more information than manual aggregation. The hybridarchitecture is thus able to train faster and achieve better performancethan LSTM-alone architectures, as will become apparent below.

In the following discussion, an example environment is first describedthat may employ the techniques described herein. Example procedures arealso described which may be performed in the example environment as wellas other environments. Consequently, performance of the exampleprocedures is not limited to the example environment and the exampleenvironment is not limited to performance of the example procedures.

Example Environment

FIG. 1 is an illustration of a digital medium environment 100 in anexample implementation that is operable to employ techniques for hybriddeep-learning for predicting user intended actions as described herein.The illustrated environment 100 includes a service provider system 102,a digital analytics system 104, and a plurality of client devices, anexample of which is illustrated as client device 106. In this example,actions are described involving user actions performed throughinteraction with client devices 106. Other types of actions are alsocontemplated, including device actions (e.g., failure, resource usage),and so forth that are achieved without user interaction. These devicesare communicatively coupled, one to another, via a network 108 and maybe implemented by a computing device that may assume a wide variety ofconfigurations.

A computing device, for instance, may be configured as a desktopcomputer, a laptop computer, a mobile device (e.g., assuming a handheldconfiguration such as a tablet or mobile phone), and so forth. Thus, thecomputing device may range from full resource devices with substantialmemory and processor resources (e.g., personal computers, game consoles)to a low-resource device with limited memory and/or processing resources(e.g., mobile devices). Additionally, although a single computing deviceis shown, a computing device may be representative of a plurality ofdifferent devices, such as multiple servers utilized by a business toperform operations “over the cloud” as shown for the service providersystem 102 and the digital analytics system 104 and as further describedin FIG. 7.

The client device 106 is illustrated as engaging in user interactionwith a service manager module 112 of the service provider system 102. Aspart of this user interaction, feature data 110 is generated. Thefeature data 110 describes characteristics of the user interaction inthis example, such as demographics of the client device 106 and/or userof the client device 106, network 108, events, locations, and so forth.The service provider system 102, for instance, may be configured tosupport user interaction with digital content 118. A dataset 114 is thengenerated (e.g., by the service manager module 112) that describes thisuser interaction, characteristics of the user interaction, the featuredata 110, and so forth, which may be stored in a storage device 116.

Digital content 118 may take a variety of forms and thus userinteraction and associated events with the digital content 118 may alsotake a variety of forms in this example. A user of the client device106, for instance, may read an article of digital content 118, view adigital video, listen to digital music, view posts and messages on asocial network system, subscribe or unsubscribe, purchase anapplication, and so forth. In another example, the digital content 118is configured as digital marketing content to cause conversion of a goodor service, e.g., by “clicking” an ad, purchase of the good or service,and so forth. Digital marketing content may also take a variety offorms, such as electronic messages, email, banner ads, posts, articles,blogs, and so forth. Accordingly, digital marketing content is typicallyemployed to raise awareness and conversion of the good or servicecorresponding to the content. In another example, user interaction andthus generation of the dataset 114 may also occur locally on the clientdevice 106.

The dataset 114 is received by the digital analytics system 104, whichin the illustrated example employs this data to control output of thedigital content 118 to the client device 106. To do so, an analyticsmanager module 122 generates data describing a predicted action,illustrated as predicted action data 124. The predicted action data 124is configured to control which items of the digital content 118 areoutput to the client device 106, e.g., directly via the network 108 orindirectly via the service provider system 102, by the digital contentcontrol module 126.

To generate the predicted action data 124, the analytics manager module122 implements a hybrid deep learning analytics system 128 having a mainpath 130 and an auxiliary path 132. The hybrid deep learningarchitecture system 128 provides an automated, learning architecturethat overcomes limitations of conventional handcrafted efforts to thusprovide an improved feature set that increases accuracy of a model usedto generate a prediction of occurrence of an action, e.g., the generatethe predicted action data 124.

The hybrid deep learning architecture system 128 solves conventionaltechnical challenges by incorporating a main path 130 that includesmodules that implement neural networks to process input data includingactivity logs and the like, and an auxiliary path 132 that processesprofiles (e.g., having static profile features and dynamic profilefeatures). The hybrid deep learning architecture system 128 alsoincludes a connection architecture implemented as another neural networkbetween the main and auxiliary paths 130, 132 respectively, to leveragelong term interactions determined from the main path 130 with profilefeatures (e.g., both the static profile features and dynamic profilefeatures) of the auxiliary path 132 to produce predicted intended useractions. This facilitates data utilization to provide multiplehistorical outcomes for each entity.

The innovative hybrid deep learning architecture system 128 also reducesbiased data sampling by, at least in part, utilizing a convolutionalneural network system to summarize aggregated user input, such asactivity logs, and processing the summarized aggregated user input usinga long short term memory (LSTM) neural network system. The long shortterm memory neural network approach facilitates classification,processing, and predicting time series given time lags of unknown sizeand duration between events. A time distributed dense network system isthen used to process the data produced by the long short term memoryneural network, as well as static and dynamic profile data from theauxiliary path 132 to provide more robust and accurate labels whichconstitute predicted user intended actions that are predicted to happenin a next fixed time window at the end of a LSTM summary time span. Thecomputing device 102 may be coupled to other computing devices via anetwork and may be implemented by a computing device that may assume awide variety of configurations.

In the illustrated and described example, and as shown in more detail inFIG. 2, the main path 130 of the hybrid deep learning architecturesystem 138 includes an input data module 204, a first neural network(e.g., implemented by a convolutional neural network module 206) asecond neural network (e.g., implemented by a long short-term memoryneural network module 208), and a third neural network (e.g.,implemented by a time distributed dense network module 210). Theauxiliary path 132 includes a static profile feature module 212 and adynamic profile feature module 214. The static profile feature module212 and dynamic profile feature module 214 provide input to the timedistributed dense network module 210 to produce an output 216 which, inthis example, comprises predicted user action labels. The modules thatconstitute the main path 106 and auxiliary path 108 can be implementedin any suitable hardware, software, firmware, or combination thereof.

The Main Path—130

In the main path 130, the input data module 204 receives user input datawhich is the summary of user product usage activities over certaingranularities of time. The granularities of time can vary. The userusage activities can include, by way of example and not limitation,products launched (e.g., with software programs have been launched),usage of specific features within the products for software companies,or product webpage browser, add-to-cart functionality, product purchasesfor ecommerce companies, or account activities, credit card usage,online banking logins for banks and financial institutions, or otherrelevant product or service usages for different companies in variouslines of businesses. The summaries can include, by way of example andnot limitation, a sum, mean, minimum, max, standard deviation, and otheraggregation methods applied to counts, time duration of the useractivities, and the like. As noted above, granularities of time caninclude, by way of example and not limitation, minute, hourly, daily,weekly, monthly, or any reasonable time duration. Thus, thegranularities of time associated with user usage summaries can berepresented as a time span, which can be organized as a vector.

The input data module 204 processes the input data to divide the inputdata into blocks which contain user usage summary vectors over many timespans.

Then, each block of input data is passed to a first neural network ofthe hybrid deep learning architecture system 128. In the illustratedexample, the first neural network is implemented by a convolutionalneural network module 206. The convolutional neural network module 206may include one or more convolutional neural networks (CNNs) that canprocess data as described above and below. In the present example, theconvolutional neural network module 206 is utilized to aggregate usageinformation at different levels via a configurable kernel size. Oneexample of how this can be done is provided below in the sectionentitled “Implementation Example”.

The convolutional neural network module 206 is capable of transformingoriginal summary time steps to coarser granularities of time spans. Forexample, if original input data received from the input data module 204is a daily summary, blocks of 7 daily summaries can be passed by theinput data module 204 to the convolutional neural network module 206,and processed to have an output of one vector. Effectively, in thisexample, this achieves a weekly summary. It is to be appreciated andunderstood that this design is more automatic and incorporates farricher relations than handcrafted aggregation efforts can do; and, therich relations are learned through training the whole model. With theillustrated and described convolutional neural network module 206, asystem may start with a relative finer granularity time span summary,then transit to a coarser granularity time span summary though the CNNs.Hence, this achieves noise reduction and training data size reduction,and enables the model to train faster, without loss of model accuracy.It is to be appreciated and understood that the blocks passed into theconvolutional neural network module 206 can be non-overlapping andcontinuous, or partially overlapped. Further, in one or moreimplementations, multiple layers of CNNs can be introduced to performfurther summary, e.g. the convolutional neural network module 206 mayinclude a first CNN (CNN1) and a second CNN (CNN2) to perform furthersummaries, as described in more detail in FIG. 3. All these variationsin the CNN architecture and block size can be tuned to achieve the bestmodel performance on the validation data. Thus, a dynamic andflexibly-tunable system can be utilized to quickly and efficiently adaptto different data processing environments.

The aggregated output of the convolutional neural network module 206 isprovided to a second neural network, which is illustrated as implementedby a long short-term memory (LSTM) neural network module 208. In thisparticular example, the LSTM is a predicting component of the hybriddeep-learning architecture system 128.

Any number of LSTMs can be used. In at least some implementations, aconfiguration of two LSTM layers is utilized, as described in moredetail in FIG. 4. LSTMs with multiple inputs and outputs are designed inthese implementations to capture long-range interactions amongaggregated usage across different time frames. Since LSTMs may have anoutput for every layer, LSTMs can perform model training using actionlabel at multiple time steps simultaneously at the minimum timeresolution of the LSTM output. This is to train the LSTM model to learnmultiple labels at the same time due to the architecture of LSTM (i.e.,outputs at every hidden layer). The training of the model isaccomplished, in this implementation, using TensorFlow, an open sourceMachine Learning framework, which deals with the training and minimizesthe loss function in which multiple labels at different LSTM layerscontribute to the loss at the same time. Hence, the model learns themultiple labels at the same time.

The output of the long short-term memory neural network module 208 isprovided to a third neural network, an illustrated example of which isimplemented by a time distributed dense network module 210. The timedistributed dense network module 210 also receives a profile from theauxiliary path 108 in the form of one or more of static profile featuresfrom static profile feature module 212, or dynamic profile features fromdynamic profile feature module 214. The profile is incorporated into themodel in order to improve performance as further described in thefollowing section.

The Auxiliary Path—132

In the auxiliary path 132, profiles are taken as inputs to the thirdneural network of the time distributed dense network module 210 toaugment the learning of the hybrid deep learning architecture system138. In the illustrated and described implementation, profiles can bestatic, dynamic, or both.

The static profiles are shared across all output time steps after theLSTM output. The dynamic profiles, such as subscription age, areassociated with the corresponding output steps for the same entity,e.g., device or user. Specifically, relatively static profiles covermany details including, but not limited to, gender, geographicallocation, market segments and so forth. Regarding the representation ofsubscription age, some implementations may conduct both monthly andannual discretization of age (days since subscription) to capture thecorresponding two representative subscription types.

Taken together, for each time step, the output status learned from usagein the main path 130 (output from LSTM) and the fused vector of dynamicprofiles (like subscription age) and static profiles are concatenatedand then provided as input to the third neural network of thetime-distributed dense network module 210 which, in this example, arefully connected networks to predict the action label—in this case,output 216.

In the illustrated and described example, label definition isstraightforward. Since actions, like conversion or churn, may happen anytime in the future, the probability of the actions happening at aspecific moment (infinitesimal time interval) approaches zero. Hence, aprobability is predicted as to whether the action will happen in thenext fixed time window for convenience, i.e. cumulative probability inthat window. Thus, in the learning architecture, the label is defined asaction happening in the next fixed time window at the end of the LSTMsummary time span. This fixed time window can be 1 week, 1 month, 3months, or any other reasonable time span that fits a particularbusiness requirement. As mentioned previously, action labels can bedefined at every fully connected network linking LSTM output with theauxiliary path, which captures the evolution of action status of asingle entity. This practice also increases data utilization comparedwith conventional techniques, since a single entity's historical data isutilized multiple times in training.

Having considered an example operating environment that includes ahybrid deep learning architecture system 128, consider now exampleprocedures in accordance with one or more implementations.

Example Procedures

The following discussion describes techniques that may be implementedutilizing the previously described systems and devices. Aspects of eachof the procedures may be implemented in hardware, firmware, software, ora combination thereof. The procedures are shown as a set of blocks thatspecify operations performed by one or more devices and are notnecessarily limited to the orders shown for performing the operations bythe respective blocks. In portions of the following discussion,reference will be made to FIGS. 1 and 2 which constitutes but one way ofimplementing the described functionality.

FIG. 3 depicts a procedure 300 in an example implementation in which ahybrid deep-learning architecture system 128 is be utilized to predictaction occurrence. As but one example, the various functional blocksabout to be described are associated with the architecture described inFIGS. 1 and 2 for purposes of providing the reader context of but onesystem that can be utilized to implement the described innovation. It isto be appreciated and understood, however, that architectures other thanthe specifically described architecture of FIGS. 1 and 2 can be utilizedwithout departing from the spirit and scope of the claimed subjectmatter.

At block 302, input data is received describing a summary of actionsperformed by a corresponding entity over a first granularity of timespan. This operation can be performed, for example, by input data module204. The input data can include any suitable type of data that describesoccurrence of actions over time by a entity, e.g., device or user. Theinput data may vary greatly to describe a variety of different entitiesand actions associated with the entities. The entities, for instance,may describe devices and therefore the actions may refer to operationsperformed by the devices. In another example, the entities referenceusers and actions performed by the users, e.g., conversion, signing upof a subscription, and so forth. In addition, time span granularity canvary as well depending on such things as the nature of the entities andactions that are processed by the hybrid deep learning architecturesystem 128.

At block 304, the input data is processed to generate blocks containingsummary vectors over a plurality of time spans. This operation can beperformed, for example, by input data module 204. At block 306, theblocks of user usage summary vectors are aggregated to generate asummary of actions over a second, coarser granularity of time span. Inone or more implementations, this operation can be performed by aconvolutional neural network module 206 which may include one or moreCNNs to facilitate aggregation at different levels. Aggregation ofblocks can result in daily summaries being aggregated into weeklysummaries, weekly summaries being aggregated into monthly summaries, andso on. In some instances, one CNN may aggregate the daily summaries intoweekly summaries, and another CNN may aggregate the weekly summariesinto monthly summaries.

At block 308, the summary over the second, coarser granularity of timespan is processed by a second neural network to determine long-rangeinteractions across different time frames. This operation can beperformed by the second neural network as implemented by a long shortterm memory neural network module 208.

At block 310, the captured long-range interactions are processed by athird neural network with a profile obtained from the auxiliary path topredict action labels. The profile may include one or more of staticprofile features or dynamic profile features as described above. In oneimplementation, this operation can be performed by the third neuralnetwork as implemented by the time distributed dense network module 210.

Consider now an implementation example that illustrates variousadvantages of the described innovation over conventional systems.

Implementation Example

To illustrate the above-described hybrid deep-learning architecturebased on the multi-path algorithm for action prediction, the followingdemonstration illustrates a specific application of the innovation topredict customer churn for Adobe products. The model was developed basedon historical data of Adobe users of seven products (Photoshop,Illustrator, Lightroom etc.) from Apr. 1, 2014 to May 31, 2017. Churnusers (positive examples) and active users (negative examples) weresampled to 1:1 ratio to form the training data with about 660,000training examples.

In this specific implementation example, the raw input data into thearchitecture was the daily product usage summary Specifically, the inputdata used included the daily launch counts and daily total session timeof all launches for each of the seven products. In this manner, 14 dailyusage summary features are used to form the feature vectors, and 360 ofthese daily summary feature vectors were created for each user to formthe raw input data processed by the input data module 204 in FIG. 2.

The architecture and module associations used in this particular exampleis represented in FIG. 4 generally at 400. In this particularimplementation examples, two ConvNets 402, 404 (ConvNet1 and ConvNet2)are chosen to constitute the convolutional neural network module 206,and two LSTMs 406, 408 (LSTM1 and LSTM2) are chosen to constitute thelong short term memory neural network module 208 (FIG. 2). In operation,360 daily summary feature vectors of length 14 are fed into the ConvNet1402 (32 kernels with size of 2 and stride of 2) followed by ConvNet2 404(32 kernels with size of 5 and stride of 5). The resultant 36 outputfeature vectors of length 32 are then fed into LSTM1 406 with 36recurrent layers (64 kernels each layer) and 36 output units, which arefurther followed by LSTM2 408 with 36 recurrent layers (64 kernels eachlayer) and 12 output units. The respective LSTM outputs and the profilefeatures from auxiliary path 108 are then integrated and fed totwo-layer dense neural networks 410, 412 (time distributed dense networkmodule 210) of 40 and 20 nodes to predict churn labels.

The static profile features (static profile feature module 212) in theauxiliary path 108, are composed of geographical location and marketsegment which are copied and fed to the dense neural networks 410, 412,and the dynamic profile features (dynamic profile feature module 214)like the user subscription age are fed into the dense neural networks410, 412 at every LSTM with corresponding output values. The churnlabels only appear at the final output at a 30-day interval. Churn isdefined in this instance as un-subscription or no renewal aftersubscription expiration in the next 30 days at the end of the featuresummary window.

It is noted that the chosen specific variation is only for demonstrationpurposes considering both simplicity and performance. It is to beappreciated and understood that while the implementation example used aspecific number of ConvNets and LSTMs, the techniques and systemdescribed herein can be employed using combinations of any number ofConvNets and RNN/LSTMs connected in a similar manner as described above,regardless of any variation in the associated model hyper-parameters,such as number of ConvNets and LSTMs, number of input feature vectorspassed to ConvNets, kernel number and size (aggregation granularity) ofdifferent layers and final output units.

For purposes of evaluation, a comparison was made of the performance ofthis innovative realization (annotated as “DLChurn” in FIG. 5) withother conventional methods in two scenarios. In the first scenario, wefocused on the users who were still active on May 31, 2017. The churnprobability in the next month (Jun. 1 to Jun. 30, 2017) of thetechniques described herein is compared with different baseline models:naïve logistic regression (LR_Naive), logistic regression withmulti-snapshot data (LR_MS), and random forest with multi-snapshot data(RF_MS). The results are reported in FIG. 5 at 500.

Performance comparisons of the techniques described herein against otherbaselines in terms of metrics Area under the Receiving Operating Curves(AUC@ROC), Area under the Precision-Recall Curves (AUC@PR), Matthewscorrelation coefficient (MCC) and F1 Score.

These comparisons clearly indicate that the hybrid deep-learning actionprediction architecture significantly outperforms other popularconventional methods. In the AUC@ROC, a higher value means that themodel is better at distinguishing rank order of positive and negativeaction. In the AUC@PR, precision is the fraction of true positives outof all the examples that the model predicts is positive (above certainthreshold). Recall is the fraction of true positives the model retrieves(above certain threshold) out of all positives. The PR-curve is to plotprecision against recall at different model score thresholds. Highervalues mean that the precision of the model is higher at differentrecalls. The Matthews correlation coefficient is used in machinelearning as a measure of the quality of binary (two-class)classifications. It takes into account true and false positives andnegatives and is generally regarded as a balanced measure which can beused even if the classes are of very different sizes. The F1 score isthe harmonic mean of precision and recall. The F1 score is a balance ofprecision and recall.

In the second scenario, a comparison is made of current productionmodels on users who are active at the beginning of July, 2017. As theresults show in FIG. 6, at 600, the hybrid deep-learning actionprediction architecture exhibits improved performance over conventionalpredictive models.

The illustrated results show performance comparisons of the hybriddeep-learning action prediction architecture against conventionalproduction models in terms of metrics Area under the Receiving OperatingCurves (AUC@ROC), Area under the Precision-Recall Curves (AUC@PR),Matthews correlation coefficient (MCC) and F1 Score.

Example System and Device

FIG. 7 illustrates an example system generally at 700 that includes anexample computing device 702 that is representative of one or morecomputing systems and/or devices that may implement the varioustechniques described herein. This is illustrated through inclusion ofthe hybrid deep learning architecture system 128. The computing device702 may be, for example, a server of a service provider, a deviceassociated with a client (e.g., a client device), an on-chip system,and/or any other suitable computing device or computing system.

The example computing device 702 as illustrated includes a processingsystem 704, one or more computer-readable media 706, and one or more I/Ointerface 708 that are communicatively coupled, one to another. Althoughnot shown, the computing device 702 may further include a system bus orother data and command transfer system that couples the variouscomponents, one to another. A system bus can include any one orcombination of different bus structures, such as a memory bus or memorycontroller, a peripheral bus, a universal serial bus, and/or a processoror local bus that utilizes any of a variety of bus architectures. Avariety of other examples are also contemplated, such as control anddata lines.

The processing system 704 is representative of functionality to performone or more operations using hardware. Accordingly, the processingsystem 704 is illustrated as including hardware elements 710 that may beconfigured as processors, functional blocks, and so forth. This mayinclude implementation in hardware as an application specific integratedcircuit or other logic device formed using one or more semiconductors.The hardware elements 710 are not limited by the materials from whichthey are formed or the processing mechanisms employed therein. Forexample, processors may be comprised of semiconductor(s) and/ortransistors (e.g., electronic integrated circuits (ICs)). In such acontext, processor-executable instructions may beelectronically-executable instructions.

The computer-readable storage media 706 is illustrated as includingmemory/storage 712. The memory/storage 712 represents memory/storagecapacity associated with one or more computer-readable media. Thememory/storage component 712 may include volatile media (such as randomaccess memory (RAM)) and/or nonvolatile media (such as read only memory(ROM), Flash memory, optical disks, magnetic disks, and so forth). Thememory/storage component 712 may include fixed media (e.g., RAM, ROM, afixed hard drive, and so on) as well as removable media (e.g., Flashmemory, a removable hard drive, an optical disc, and so forth). Thecomputer-readable media 706 may be configured in a variety of other waysas further described below.

Input/output interface(s) 708 are representative of functionality toallow a user to enter commands and information to computing device 702,and also allow information to be presented to the user and/or othercomponents or devices using various input/output devices. Examples ofinput devices include a keyboard, a cursor control device (e.g., amouse), a microphone, a scanner, touch functionality (e.g., capacitiveor other sensors that are configured to detect physical touch), a camera(e.g., which may employ visible or non-visible wavelengths such asinfrared frequencies to recognize movement as gestures that do notinvolve touch), and so forth. Examples of output devices include adisplay device (e.g., a monitor or projector), speakers, a printer, anetwork card, tactile-response device, and so forth. Thus, the computingdevice 702 may be configured in a variety of ways as further describedbelow to support user interaction.

Various techniques may be described herein in the general context ofsoftware, hardware elements, or program modules. Generally, such modulesinclude routines, programs, objects, elements, components, datastructures, and so forth that perform particular tasks or implementparticular abstract data types. The terms “module,” “functionality,” and“component” as used herein generally represent software, firmware,hardware, or a combination thereof. The features of the techniquesdescribed herein are platform-independent, meaning that the techniquesmay be implemented on a variety of commercial computing platforms havinga variety of processors.

An implementation of the described modules and techniques may be storedon or transmitted across some form of computer-readable media. Thecomputer-readable media may include a variety of media that may beaccessed by the computing device 702. By way of example, and notlimitation, computer-readable media may include “computer-readablestorage media” and “computer-readable signal media.”

“Computer-readable storage media” may refer to media and/or devices thatenable persistent and/or non-transitory storage of information incontrast to mere signal transmission, carrier waves, or signals per se.Thus, computer-readable storage media refers to non-signal bearingmedia. The computer-readable storage media includes hardware such asvolatile and non-volatile, removable and non-removable media and/orstorage devices implemented in a method or technology suitable forstorage of information such as computer readable instructions, datastructures, program modules, logic elements/circuits, or other data.Examples of computer-readable storage media may include, but are notlimited to, RAM, ROM, EEPROM, flash memory or other memory technology,CD-ROM, digital versatile disks (DVD) or other optical storage, harddisks, magnetic cassettes, magnetic tape, magnetic disk storage or othermagnetic storage devices, or other storage device, tangible media, orarticle of manufacture suitable to store the desired information andwhich may be accessed by a computer.

“Computer-readable signal media” may refer to a signal-bearing mediumthat is configured to transmit instructions to the hardware of thecomputing device 502, such as via a network. Signal media typically mayembody computer readable instructions, data structures, program modules,or other data in a modulated data signal, such as carrier waves, datasignals, or other transport mechanism. Signal media also include anyinformation delivery media. The term “modulated data signal” means asignal that has one or more of its characteristics set or changed insuch a manner as to encode information in the signal. By way of example,and not limitation, communication media include wired media such as awired network or direct-wired connection, and wireless media such asacoustic, RF, infrared, and other wireless media.

As previously described, hardware elements 710 and computer-readablemedia 706 are representative of modules, programmable device logicand/or fixed device logic implemented in a hardware form that may beemployed in some embodiments to implement at least some aspects of thetechniques described herein, such as to perform one or moreinstructions. Hardware may include components of an integrated circuitor on-chip system, an application-specific integrated circuit (ASIC), afield-programmable gate array (FPGA), a complex programmable logicdevice (CPLD), and other implementations in silicon or other hardware.In this context, hardware may operate as a processing device thatperforms program tasks defined by instructions and/or logic embodied bythe hardware as well as a hardware utilized to store instructions forexecution, e.g., the computer-readable storage media describedpreviously.

Combinations of the foregoing may also be employed to implement varioustechniques described herein. Accordingly, software, hardware, orexecutable modules may be implemented as one or more instructions and/orlogic embodied on some form of computer-readable storage media and/or byone or more hardware elements 710. The computing device 702 may beconfigured to implement particular instructions and/or functionscorresponding to the software and/or hardware modules. Accordingly,implementation of a module that is executable by the computing device702 as software may be achieved at least partially in hardware, e.g.,through use of computer-readable storage media and/or hardware elements710 of the processing system 704. The instructions and/or functions maybe executable/operable by one or more articles of manufacture (forexample, one or more computing devices 702 and/or processing systems704) to implement techniques, modules, and examples described herein.

The techniques described herein may be supported by variousconfigurations of the computing device 702 and are not limited to thespecific examples of the techniques described herein. This functionalitymay also be implemented all or in part through use of a distributedsystem, such as over a “cloud” 714 via a platform 716 as describedbelow.

The cloud 714 includes and/or is representative of a platform 716 forresources 718. The platform 716 abstracts underlying functionality ofhardware (e.g., servers) and software resources of the cloud 714. Theresources 718 may include applications and/or data that can be utilizedwhile computer processing is executed on servers that are remote fromthe computing device 702. Resources 718 can also include servicesprovided over the Internet and/or through a subscriber network, such asa cellular or Wi-Fi network.

The platform 716 may abstract resources and functions to connect thecomputing device 702 with other computing devices. The platform 716 mayalso serve to abstract scaling of resources to provide a correspondinglevel of scale to encountered demand for the resources 718 that areimplemented via the platform 716. Accordingly, in an interconnecteddevice embodiment, implementation of functionality described herein maybe distributed throughout the system 700. For example, thefunctionality, i.e., hybrid deep learning architecture system 104, maybe implemented in part on the computing device 702 as well as via theplatform 716 that abstracts the functionality of the cloud 714.

CONCLUSION

The hybrid deep-learning architecture system described above is able topredict user intended actions more quickly and efficiently, which is ofgreat business value to companies. As noted above, the unique modelarchitecture is composed of a main path and an auxiliary path. The mainpath may contain multiple layers of convolutional neural networks forfurther aggregation to coarser time spans. The resultant data producedby the convolutional neural networks is passed to multiple layers ofLSTMs. The outputs from LSTMs are then combined with the user profile inthe auxiliary path to predict user intended action label.

This unique model architecture has several advantages over traditionalmethods to predict user actions. Specifically, the architecture iscapable of automatic feature generation and hence, handcrafted featuresare no longer needed. Furthermore, the architecture provides multipleoutputs for one user at many recurrent layers of LSTMs for increaseddata utilization.

This formulation also has advantages over LSTM-alone architectures.Specifically, the introduction of the auxiliary path enables inclusionof profile features, which improves model performance. In addition, theintroduction of convolutional neural networks transforms originalsummary time steps to coarser granularities, which reduces both noiseand training time. Since convolutional neural networks can have acomplex structure and the weights are learned through training, this wayof aggregation is more automatic and can preserve more information thanmanual aggregation. The convolutional neural networks and LSTM hybridarchitecture is able to train faster and achieve better performance thanLSTM alone architecture.

Although the invention has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the invention defined in the appended claims is not necessarilylimited to the specific features or acts described. Rather, the specificfeatures and acts are disclosed as example forms of implementing theclaimed invention.

What is claimed is:
 1. In a digital medium action predictionenvironment, a method implemented by at least one computing device, themethod comprising: generating, by the at least one computing device, asummary of actions over a time span from input data by aggregatingblocks of usage summary vectors using a first neural network of a firstpath of a machine-learning network architecture; determining, by the atleast one computing device, long range interactions across differenttimeframes from the summary using a second neural network of the firstpath; obtaining, by the at least one computing device, a profile from asecond path of the machine-learning network architecture, the profiledescribing characteristics of an entity associated with the actions; andgenerating, by the at least one computing device, a prediction of anaction by a third neural network based on the obtained profile from thesecond path and the determined long range interactions across thedifferent timeframes from the first path of the machine-learning networkarchitecture.
 2. The method as described in claim 1, wherein the secondneural network used for the determining of long range interactions is along short term memory (LSTM) neural network.
 3. The method as describedin claim 1, wherein the first neural network used for the generating ofthe summary of actions is a convolutional neural network.
 4. The methodas described in claim 1, wherein the third neural network used for thegenerating of the prediction is a time-distributed dense neural network.5. The method as described in claim 1, wherein the first neural networkincludes first and second convolutional neural networks, the secondneural network includes first and second long short term memory (LSTM)neural networks, and the third neural network includes first and secondtime-distributed fully connected dense neural networks.
 6. The method asdescribed in claim 1, wherein the entity is a device and the action isan operation performed by the device.
 7. The method as described inclaim 1, wherein the entity is a user and the actions are performed bythe user.
 8. The method as described in claim 1, wherein the profile isa static profile that is shared across each of the different timeframes.9. The method as described in claim 1, wherein the profile is a dynamicprofile that is shared with a corresponding time of the differenttimeframes.
 10. The method as described in claim 1, further comprisinggenerating, by the at least one computing device, the blocks thatcontain usage summary vectors over a plurality of time spans based oninput data describing the actions over time span having a firstgranularity and wherein the generating of the summary has a secondgranularity that is coarser than the first granularity.
 11. In a digitalmedium action prediction environment, a machine-learning architecturesystem for predicting intended actions comprising: a first neuralnetwork implemented by at least one computing device to generate asummary of actions over a time span from input data by aggregatingblocks of usage summary vectors; a second neural network implemented bythe at least one computing device to determine long range interactionsacross different timeframes from the summary; a profile feature moduleimplemented by the at least one computing device to obtain a profiledescribing characteristics of an entity associated with the actions; anda third neural network implemented by the at least one computing deviceto generate a prediction of an action based on the profile from theprofile feature module and the determined long range interactions acrossthe different timeframes from the second neural network.
 12. The systemas described in claim 11, wherein the first and second neural networksform a first path in the machine-learning architecture system and theprofile feature module forms a second path in the machine-learningarchitecture system, the first and second paths joined at the thirdneural network.
 13. The system as described in claim 11, wherein thefirst neural network is a convolutional neural network.
 14. The systemas described in claim 11, wherein the second neural network is a longshort term memory (LSTM) neural network.
 15. The system as described inclaim 11, wherein the third neural network is a time-distributed denseneural network.
 16. The system as described in claim 11, wherein thefirst neural network includes first and second convolutional neuralnetworks, the second neural network includes first and second long shortterm memory (LSTM) neural networks, and the third neural networkincludes first and second time-distributed fully connected dense neuralnetworks.
 17. The system as described in claim 11, wherein the entity isa device and the action is an operation performed by the device.
 18. Thesystem as described in claim 11, wherein the entity is a user and theactions are performed by the user.
 19. The system as described in claim11, further comprising an input data module implemented by the at leastone computing device to generate the blocks that contain usage summaryvectors over a plurality of time spans based on input data describingthe actions over time span having a first granularity and wherein thesummary has a second granularity that is coarser than the firstgranularity.
 20. In a digital medium action prediction environment, amachine-learning architecture system for predicting intended actionscomprising: means for generating a summary of actions over a time spanfrom input data by aggregating blocks of usage summary vectors; meansfor determining long range interactions across different timeframes fromthe summary; means for obtaining a profile describing characteristics ofan entity associated with the actions; and means for generating aprediction of an action based on the profile and the determined longrange interactions across the different timeframes.